Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: implement extension config discovery for HCM #11826

Merged
merged 50 commits into from
Jul 23, 2020

Conversation

kyessenov
Copy link
Contributor

@kyessenov kyessenov commented Jun 30, 2020

Commit Message:
Implements a filter config discovery client. Modeled after RDS with a few differences:

  • subscriptions are shared between unique config providers;
  • streams are started via one of two targets: the subscription target for a warming case and the provider target for a non-warming case.

Remaining items to finish:

  • wire stats to the client
  • unit and integration tests
  • documentation
  • (later PR) config dump

One interesting issue with contextual validation is that adding a context may invalidate a resource in xDS (e.g. adding a reference to a filter config as a terminal filter in the filter chain after the non-terminal filter config is received).

Risk Level: low (the code is not activated unless the config is used)
Testing: manual for now
Docs Changes: needed
Release Notes: add a filter config discovery client
Fixes: #7867

Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
@mattklein123
Copy link
Member

I won't be able to review this before vacation. @htuch can you maybe take a quick pass next week and then I can look the week after?

@htuch
Copy link
Member

htuch commented Jul 1, 2020

Yeah, sure, will look at this early next week.

@htuch
Copy link
Member

htuch commented Jul 6, 2020

@kyessenov before diving into the details, I guess it might be worth discussing whether it's worth having another config-specific management system for FilterConfigDS. I.e. there is a ton of conceptual and implementation complexity in both RDS and FilterConfigDS in terms of how warming, shared configuration, thread-local state and provider/subscriber notions are handled. Can we unify them?

I think the one thing that might make some abstraction and reduction of complexity hard is that RDS needs to deal with VHDS and SRDS as well. Nonetheless, it's worrying that we are continuing to expand the amount of config state management complexity when we have these common patterns in RDS and FilterConfigDS. Any thoughts on how to make this better?

@kyessenov
Copy link
Contributor Author

@htuch RDS is mired in the complexity of SRDS and VHDS so it's really hard to reuse it's pieces. The sharing model is also rather strange (shared provider with shared subscriptions), so it would need some refactoring to become reusable. I thought about that, but it's so intertwined that it's hard not to break.

The warming bit is interesting since it's not present in any other xDSes, so that's novel.

I agree that the majority of the code is just glue, but not sure if it's going to make it simpler to understand if we abstract all the glue into some framework.

Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Overall, structurally I think this makes sense, but I'd like to see what can be done to make it easier to grok and follow the warming/initialization flow. Flushing some comments.
/wait

include/envoy/router/filter_config_provider.h Outdated Show resolved Hide resolved
include/envoy/router/filter_config_provider.h Outdated Show resolved Hide resolved
* Validate if the route configuration can be applied in the context of the filter manager.
* @param Server::Configuration::NamedHttpFilterConfigFactory a filter factory to validate in the
* context of the filter manager filter chains.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear to me what this does. Are you talking about the per-route filter config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is about filter configuration checked whether it's terminal.

source/common/router/BUILD Outdated Show resolved Hide resolved

void DynamicFilterConfigProviderImpl::validateConfig(
Server::Configuration::NamedHttpFilterConfigFactory& factory) {
if (factory.isTerminalFilter() && !require_terminal_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we validating this down here? I'd have thought it would be sufficient to validate the terminal filter condition in HCM..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is rejected if it cannot be applied. Right now it's just terminal checks, but I imagine there'll be more. We can restructure the code to use callbacks if it's more clear that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure where this additional config validation requirement comes from, in particular as a first-class API feature?

source/common/router/filter_config_discovery_impl.cc Outdated Show resolved Hide resolved
source/common/router/filter_config_discovery_impl.cc Outdated Show resolved Hide resolved
source/common/router/filter_config_discovery_impl.h Outdated Show resolved Hide resolved
Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please go ahead with the current design. There's a few threads to discuss further while doing so.
/wait

source/common/router/filter_config_discovery_impl.cc Outdated Show resolved Hide resolved
"Server sent a delta FilterConfigDS update attempting to remove a resource (name: "
"{}). Ignoring.",
removed_resources[0]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we discuss previously how delta xDS should look? I think what you have is consistent with RDS/EDS, but I'm also tempted to prefer to make delta xDS behave more consistently with new APIs. WDYT? CC @markdroth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't discussed. I think for now it's simple since it's just a single resource per subscription. Eventually we might want to combine many resources per xDS request.

source/common/router/filter_config_discovery_impl.cc Outdated Show resolved Hide resolved
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
@repokitteh-read-only repokitteh-read-only bot added api and removed waiting labels Jul 8, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #11826 was synchronize by kyessenov.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to figure out the validation story.

// Otherwise, mark ready immediately and start the subscription on initialization. A default
// config is expected in the latter case.
if (!apply_without_warming) {
factory_context.initManager().add(subscription->initTarget());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this server config factory context accurately modeling the listener warming chain? I.e. is this going to ensure listener warming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the context from HCM which I assume is listener context. I think RDS uses the init manager from this context. Let me know if I misunderstand something.

@kyessenov
Copy link
Contributor Author

OK, I'm stuck on figuring out why integration test framework flakes for me. I'm seeing delayed connection error 111 on a test request sometimes, and I have no idea why the connection is rejected.

@kyessenov
Copy link
Contributor Author

Integration fiasco might be linked to #11871. Listeners is marked as ready (in both waitUntilListenersReady and the stats), yet the connection is refused.

Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov
Copy link
Contributor Author

Tested a few times with 100 iterations. --runs_per_test=1000 causes time out issues (I guess it's too much load for my 32 cores).

@mattklein123
Copy link
Member

Does it time out in your test or elsewhere?

@kyessenov
Copy link
Contributor Author

I haven't tried other tests. I just ran it again, and got 2/1000 failures:

[2020-07-23 17:22:47.059][12][critical][assert] [test/integration/http_integration.cc:389] assert failure: result. Details: Timed out waiting for new stream.
[2020-07-23 17:22:47.148][12][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::HttpIntegrationTest::waitForNextUpstreamRequest() [0x4faf00]

It looks like it times out at default timeout which is 10s. Maybe I need to bump it?

@mattklein123
Copy link
Member

2/1000 is probably OK to merge but unfortunately we have a real issue right now with test flakes and this is bound to fail in super slow CI. cc @alyssawilk for thoughts on ^.

Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov
Copy link
Contributor Author

Thanks. I agree that the flakes should be burnt with fire. I find it very hard to write a non-flaky test in the framework though.

I found a way around it. I made HCM do a direct response, and that fixed the flake. There must be another race between the upstream being ready, and the readiness checks I have in my test.

@kyessenov
Copy link
Contributor Author

Somewhat relevant question: is there a way to simulate the integration tests with an out-of-process test driver? I like that the test is very sensitive to timing, but would prefer to use the common agents (which are in golang usually). I think I'd need some way to receive a fast update about internal events (server initialized, metric counter incremented). Currently, we depend on polling which does not allow detecting O(ms) races.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the flakes and on this awesome feature!

@mattklein123 mattklein123 merged commit e8216a8 into envoyproxy:master Jul 23, 2020
@mattklein123
Copy link
Member

@kyessenov unfortunately the integration tests are flaking regularly on master. PTAL tomorrow to see if you can fix. Otherwise I will need to revert and I can help you fix if you can't figure it out. Thank you!

@kyessenov
Copy link
Contributor Author

I noticed it fail once locally. Could it be there is a delay between when listener is accepted and applied? I suspect there's some latency between when a filter config provider posts TLS and when a worker picks it up.

@kyessenov
Copy link
Contributor Author

(by listener applied I mean HCM receives the accepted filter config)

@kyessenov
Copy link
Contributor Author

OK, seems like some destruction sequence:


[2020-07-24 05:22:24.959][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: Envoy::Config::GrpcMuxImpl::queueDiscoveryRequest() [0xf32183]
[2020-07-24 05:22:24.962][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() [0xf35141]
[2020-07-24 05:22:24.965][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() [0xf3519e]
[2020-07-24 05:22:24.968][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Config::GrpcSubscriptionImpl::~GrpcSubscriptionImpl() [0xf3092a]
[2020-07-24 05:22:24.971][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: Envoy::Filter::Http::FilterConfigSubscription::~FilterConfigSubscription() [0xe7235b]
[2020-07-24 05:22:24.974][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Filter::Http::DynamicFilterConfigProviderImpl::~DynamicFilterConfigProviderImpl() [0xe70416]
[2020-07-24 05:22:24.977][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Filter::Http::DynamicFilterConfigProviderImpl::~DynamicFilterConfigProviderImpl() [0xe705ee]

I'll take a look.

@kyessenov
Copy link
Contributor Author

Merged master and it happens a lot more frequently. Must have been some recent changes.

@kyessenov
Copy link
Contributor Author

Let's see if #12268 fixes it. It passed with 1000 iterations locally.

tags = ["fails_on_windows"],
deps = [
":http_integration_lib",
"//source/extensions/filters/http/rbac:config",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yanavlasov

I noticed a breakage during out import from OSS because we disable the rbac extension. The strong dependency on that extension that this change adds breaks compilation in our setup.

I think there's a problem with this test: it depends on an extension but is not using the extension build system macros that remove the test from execution when the relevant extension is removed from the build. But this test is using the rbac extension as an example in order to test a config loading mechanism, not testing the extension itself, so excluding the test would mean that we lose test coverage for the mechanism you introduced.

Should this test hand craft it's own test-only extension and use that to verify that the extension mechanism works as expected?

Copy link
Contributor Author

@kyessenov kyessenov Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, although not sure how valuable that is. I was operating under the assumption that RBAC is a core extension. I do need variable behavior for this test to make sense, so not many existing test extensions fit. Feel free to file an issue against me to clean this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically the list of core extensions.

_core_extensions = [

If you do need to test specific behavior, please create a test filter with the behavior you need. Other wise these tests will impact others who exclude RBAC filter from their build.

KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this pull request Aug 7, 2020
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

api: named filter config / filter config discovery service
5 participants